Rapid Acoustic Model Adaptation Using Inverse MLLR-based Feature Generation
نویسندگان
چکیده
We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker’s utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker’s utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker’s utterances than conventional MLLR, MAP and SAT.
منابع مشابه
Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve th...
متن کاملTraining Robust Acoustic Models Using Features of Pseudo-Speakers Generated by Inverse CMLLR Transformations
In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear ...
متن کاملA novel target-driven MLLR adaptation algorithm with multi-layer structure
This paper presents a novel target-driven MLLR adaptation algorithm with multiply layer structure, which is based on the thorough analysis of MLLR using the generation of regression class trees. The new algorithm is constructed on the targetdriven principal. It generates the regression class dynamically, basing on the outcome of the former MLLR transformation. The regression classes is defined ...
متن کاملSpeaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis
In this paper, we investigate the effectiveness of speaker adaptation for various essential components in deep neural network based speech synthesis, including acoustic models, acoustic feature extraction, and post-filters. In general, a speaker adaptation technique, e.g., maximum likelihood linear regression (MLLR) for HMMs or learning hidden unit contributions (LHUC) for DNNs, is applied to a...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010